import networkx as nx
import networkx.algorithms.community as nx_comm
import json
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import powerlaw
import numpy as np
from sklearn.linear_model import LinearRegression
import statistics
from scipy import stats
import matplotlib.patches as mpatches
seed = 42
Network Summary¶
Data mining and preprocessing, general charachteristics, basic visualization¶
This network represents a schema of figure skaters following each other on Instagram (designated as an extremist organization by the Russian government). The data was collected manually by downloading HTML pages from Instagram accounts. Since Instagram uses dynamic page updates, I utilized the copy_outer_html function to extract the data. Subsequently, the BeautifulSoup was utilized for parsing. Only publicly available accounts were collected to comply with Instagram's policies.
The downloaded data can be found in the insta_follows folder, the complete preprocessing workflow can be found in data_parsing.py, and the graph itself in JSON-format is located in fs_inst_following.json. Graph attributes, including point scores and full names, were collected from the Official ISU website. Links to the pages were manually gathered and can be found in FS_nicks_links.csv. In cases where automatic parsing was not possible, scores were calculated manually and stored in this CSV table. The parsing code is available in isu_site_parsing.py.
The data was collected using a balanced sampling approach. Figure skating comprises four disciplines: women's singles, men's singles, pairs, and ice dance. To minimize bias, an equal number of skaters was selected from each category: 26 women's singles skaters, 26 men's singles skaters, 26 pairs skaters (comprising 13 men and 13 women), 26 ice dance skaters (similarly, 13 men and 13 women). In total, 104 accounts were analyzed.
The network thus has the following charachteristics:
- each node represents an athlete's Instagram account; there are 104 nodes in total;
- overall, 2324 edges are represented;
- each directed edge represents a "following" relationship between two accounts, meaning the network is directed;
- all edges have equal weight, meaning the network is unweighted;
- the set of nodes is homogeneous, as all individuals belong to the same professional community (elite figure skating).
The nodes have following attributes:
- figure skating discipline;
- a merit-based point score for each skater, calculated according to ISU World Standings and Season's World Ranking;
- athlete's full name
All mentioned scripts, full dataset, this notebook, and the project presentation are available on the project's GitHub page.
with open('fs_inst_following.json', 'r', encoding='utf-8') as f:
graph_dict = json.load(f)
G = nx.DiGraph()
G.add_nodes_from(graph_dict.keys())
for follower, following_list in graph_dict.items():
for following in following_list:
if following in graph_dict.keys():
G.add_edge(follower, following)
# short descriptives
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
Nodes: 104, Edges: 2324
with open("graph_attributes.json") as f:
graph_attributes = json.load(f)
for node in G.nodes:
for attr_name, attr_value in graph_attributes[node].items():
G.nodes[node][attr_name] = attr_value
nx.write_gexf(G, 'graph.gexf')
# basic visualization
plt.figure(figsize=(16, 12))
pos = nx.spring_layout(G, k=2, iterations=50, seed=42)
nx.draw_networkx_nodes(G, pos, node_size=400, node_color='lightblue', alpha=0.9)
nx.draw_networkx_edges(G, pos, edge_color='gray', arrows=True, arrowsize=20, width=1.5, alpha=0.7)
nx.draw_networkx_labels(G, pos, font_size=9, font_weight='bold')
plt.axis('off')
plt.tight_layout()
plt.show()
Degree distribution: checking whether the network is scale-free¶
To determine whether a network exhibits scale-free properties, we need to examine whether its degree distribution follows a power law. Since I was working with a directed graph, I was to analyze the distribution of total degrees for each node, calculated by summing the in-degree and out-degree values for every node.
The Kolmogorov-Smirnov test was used to assess the goodness-of-fit between the empirical degree distribution and a theoretical power-law model. To perform this analysis, a Fit object from the powerlaw library was created. It automatically fits a power-law distribution to the data by finding the optimal lower bound (xmin) where the power-law behavior begins. The fit's quality (goodness of fit) is quantified by the Kolmogorov-Smirnov statistic (D) (.KS() method). This statistic measures the maximum distance between the CDF of the empirical data and the CDF of the fitted theoretical power-law model. A smaller D value indicates a better fit.
degrees = [G.in_degree()[node] + G.out_degree()[node] for node in G.nodes]
fit = powerlaw.Fit(degrees, discrete=True, verbose=False)
print(f"Optimal lower bound: {fit.xmin}")
print(f"Power law exponent (alpha): {fit.alpha:.3f}")
print(f"D-value of Kolmogorov-Smirnov test: {fit.power_law.KS():.3f}")
Optimal lower bound: 49.0 Power law exponent (alpha): 4.710 D-value of Kolmogorov-Smirnov test: 0.115
The analysis of the empirical degree distribution using the Kolmogorov-Smirnov goodness-of-fit test indicates that the observed network does not exhibit scale-free properties: the usual threshold for satiafactory fitting is <.05, and we have 0.115.
I compared the empirical distribution against a theoretical null model using a random network of comparable size and density. For this purpose, the Barabási–Albert random graph model will be implemented to generate a graph with an equal number of nodes and an approximate matching of the average degree. This model was chosen because it provides an opportunity to generate scale-free random network.
G_undirected = G.to_undirected()
n = G_undirected.number_of_nodes()
m = int(G_undirected.number_of_edges() / n)
random_BA = nx.barabasi_albert_graph(n, m)
random_degrees = [i for i in dict(random_BA.degree).values()]
fit = powerlaw.Fit(random_degrees, discrete=True, verbose=False)
print(f"Optimal lower bound for random graph: {fit.xmin}")
print(f"Power law exponent (alpha) for random graph:: {fit.alpha:.3f}")
print(f"D-value of Kolmogorov-Smirnov test for random graph:: {fit.power_law.KS():.3f}")
Optimal lower bound for random graph: 16.0 Power law exponent (alpha) for random graph:: 3.044 D-value of Kolmogorov-Smirnov test for random graph:: 0.073
As evident from the analysis, the random Barabási-Albert network exhibits a degree distribution that follows the power-law, whereas the created network does not demonstrate an appropriate fit to this model, as indicated by the Kolmogorov-Smirnov test values.
Additionally, I conducted a pairwise statistical KS-test to calculate the p-value and determine whether the degree distributions of our empirical network and the random Barabási-Albert model differ significantly.
_, p_value = stats.ks_2samp(degrees, random_degrees)
print(f"P-value: {p_value:.3f}")
P-value: 0.000
The extremely small p-value indicates a statistically significant difference between the degree distributions. Therefore, we reject the null hypothesis that the two distributions are identical. This finding suggests that the observed network is not scale-free.
Average distance¶
Unfortunately, the graph lacks strong connectivity. Consequently, when running the following code:
print('Radius: ', nx.radius(G))
print('Diameter: ', nx.diameter(G))
print('Average shortest path length: ', nx.average_shortest_path_length(G))
I get the following error:
NetworkXError: Found infinite path length because the digraph is not strongly connected.
Some nodes are unreachable, therefore, I focused on the analysis of largest strongly connected component of the graph.
# getting the list of strongly connected components
strongly_connected = nx.strongly_connected_components(G)
# finding the largest component
largest_component_nodes = max(list(strongly_connected), key=len)
largest_component = G.subgraph(largest_component_nodes)
print(f"Nodes: {largest_component.number_of_nodes()}, Edges: {largest_component.number_of_edges()}")
Nodes: 98, Edges: 2249
Fortunately, the largest connected component does not appear to differ significantly from the original graph: it contains only 6 fewer nodes and 75 fewer edges. This indicates that we can conduct distance analysis with a high degree of confidence. Clustering coefficient was calculated for the full graph.
print(f'Actual Radius: {nx.radius(largest_component)}')
print(f'Actual Diameter: {nx.diameter(largest_component)}')
print(f'Actual average shortest path length: {nx.average_shortest_path_length(largest_component):.3f}')
print(f'Actual average degree: {sum(dict(G.degree()).values()) / G.number_of_nodes():.3f}')
print(f"Actual clustering coefficient: {nx.average_clustering(G):.3f}")
Actual Radius: 3 Actual Diameter: 4 Actual average shortest path length: 1.871 Actual average degree: 44.692 Actual clustering coefficient: 0.390
Now let's compare this results to the random network generated by Watts-Strogatz algorithm in order to check whether the created network is a small world. If the metrics do not differ significantly, we can say that our network is a small world by the assumption that random networks represent small world features.
n = G_undirected.number_of_nodes()
avg_degree = sum(dict(G_undirected.degree()).values()) / n
k = int(round(avg_degree))
random_WS = nx.watts_strogatz_graph(n, k, p=0.1, seed=seed)
print(f'Theoretical random network Radius: {nx.radius(random_WS)}')
print(f'Theoretical random network Diameter: {nx.diameter(random_WS)}')
print(f'Theoretical random network average shortest path length: {nx.average_shortest_path_length(random_WS):.3f}')
print(f'Theoretical random network average degree: {sum(dict(random_WS.degree()).values()) / random_WS.number_of_nodes():.3f}')
print(f"Theoretical random network clustering coefficient: {nx.average_clustering(random_WS):.3f}")
Theoretical random network Radius: 2 Theoretical random network Diameter: 3 Theoretical random network average shortest path length: 1.684 Theoretical random network average degree: 34.000 Theoretical random network clustering coefficient: 0.582
The comparison reveals close similarity in average path lengths (1.684 vs. 1.871) and significant but not extreme differences in other metrics. To statistically validate whether these differences are meaningful, I performed a Kolmogorov-Smirnov test to check if the degree distributions significantly differ between the networks.
ws_degrees = [i for i in dict(random_WS.degree).values()]
_, p_value = stats.ks_2samp(degrees, ws_degrees)
print(f"P-value: {p_value:.3f}")
P-value: 0.000
The statistical test revealed different degree distributions, yet the average path lengths remain nearly identical (1.871 vs 1.684). This suggests that while connection patterns differ from random networks, the network still may achieve efficient global connectivity.
The lower clustering coefficient (0.390 vs 0.582) indicates that skaters' Instagram followings form fewer closed triangles than random connections would. This suggests professional rather than social networking patterns, where following reflects career awareness rather than mutual friendships. Combined with the short average path lengths, this reveals an efficiently connected but locally sparse professional network.
Gorgeous network layout¶
Since I have collected information about their ISU scores and discipline, let's visualize it accordingly. The score distribution is as follows:
isu_scores = [data['score'] for node, data in G.nodes(data=True)]
plt.figure(figsize=(20, 6))
plt.subplot(1, 2, 1)
n, bins, patches = plt.hist(isu_scores, bins=10, edgecolor='white',
linewidth=1.5)
colors = plt.cm.Blues(np.linspace(0.2, 0.9, len(patches)))
for i, patch in enumerate(patches):
patch.set_facecolor(colors[i])
plt.title('ISU Scores Distribution', fontsize=14, fontweight='bold', pad=15)
plt.xlabel('Score', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.grid(False)
plt.savefig('visualisations/scores_distribution.png', dpi=400)
There is a reasonably small but notable number of athletes with scores > 10,000. Let's create a visualization with the following features:
- node color represents discipline;
- node size represents ISU score;
- labels (i.e., skaters' real names, not Instagram nicknames) are displayed only for nodes with scores > 10,000;
- edges connected to these high-scoring nodes are shown with full opacity, while other edges have transparency.
inG = G.to_undirected()
disciplines = [inG.nodes[node]['discipline'] for node in inG.nodes()]
unique_disciplines = list(set(disciplines))
discipline_mapping = {disc: i for i, disc in enumerate(unique_disciplines)}
colormap = plt.colormaps.get_cmap('Pastel1')
node_colors = [colormap(discipline_mapping[disc]) for disc in disciplines]
scores = [inG.nodes[node]['score'] for node in inG.nodes()]
min_size, max_size = 5, 1000
score_norm = Normalize(vmin=min(scores), vmax=max(scores))
node_sizes = [min_size + (max_size - min_size) * score_norm(score) for score in scores]
high_score_nodes = []
high_score_labels = {}
for node in inG.nodes():
score = inG.nodes[node].get('score', 0)
if score > 10000:
high_score_nodes.append(node)
name = inG.nodes[node].get('name', node)
high_score_labels[node] = name
edge_alpha = list()
edge_width = list()
high_score_edges = []
other_edges = []
for edge in inG.edges():
u, v = edge
if u in high_score_nodes and v in high_score_nodes:
edge_alpha.append(0.1)
edge_width.append(1.5)
inG[u][v]['kawaii_weight'] = 20.
else:
edge_alpha.append(0.1)
edge_width.append(0.1)
if u in high_score_nodes or v in high_score_nodes:
inG[u][v]['kawaii_weight'] = 2
else:
inG[u][v]['kawaii_weight'] = 1
layout = nx.kamada_kawai_layout(inG, weight='kawaii_weight')
all_params = {'nodes_params': {'G': inG,
'pos': layout,
'node_size': node_sizes,
'node_color': node_colors,
'alpha':0.5},
'edges_params': {'G': inG,
'pos': layout,
'alpha': edge_alpha,
'width': edge_width},
'labels_params': {
'G': inG,
'pos': layout,
'labels': high_score_labels
}
}
plt.figure(figsize=(10, 7))
nx.draw_networkx_nodes(**all_params['nodes_params'])
nx.draw_networkx_edges(**all_params['edges_params'])
nx.draw_networkx_labels(**all_params['labels_params'])
legend_patches = [
mpatches.Patch(color='#defcce', label="Men's Singles"),
mpatches.Patch(color='#ffe6e6', label="Women's Single"),
mpatches.Patch(color='#e9dbf9', label="Women's Single"),
mpatches.Patch(color='#ceeef9', label="Ice Dance")
]
plt.legend(handles=legend_patches, loc='upper left', frameon=True,
fancybox=True, framealpha=0.9, shadow=True)
plt.savefig("visualisations/basic_graph.png", dpi=400)
plt.show()
Structural Analysis¶
Differences and similarity with ER/BA/WS models and the reasons for it¶
Finally, I generated a random graph following a Poisson degree distribution (Erdős–Rényi model) to test whether my network's structure is random.
m = G.number_of_edges()
n = G.number_of_nodes()
p = m/(n*(n-1))
random_ER = nx.erdos_renyi_graph(n, p, directed=True, seed=seed)
er_degrees = [random_ER.in_degree()[node] + random_ER.out_degree()[node] for node in random_ER.nodes]
_, p_value = stats.ks_2samp(degrees, er_degrees)
print(f"P-value: {p_value:.3f}")
P-value: 0.000
Performing a Kolmogorov-Smirnov test on the degree distributions yielded a very small p-value, indicating that my network's structure is statistically different from random and therefore unlikely to be explained by chance connections.
Here's a short summarization of what I've learned from comparing my network to ER/BA/WS models
| Model | Property to be extracted | p-value of KS-test | Other values (observed vs random) | Interpretation |
|---|---|---|---|---|
| Erdős–Rényi model | randomness of the network | ≈ 0 | degree distribution significantly different from random | |
| Barabási–Albert model | scale-free feature presence | ≈ 0 | Optimal lower bound: 49 vs 27 Alpha: 4.710 vs 3.871 D-value of KS: 0.115 vs 0.072 |
the resulting network is not scale-free; the reason for that might lie in several reasons: - relatively small size of the network; - possible imbalance in the selected skater sample (I have not managed to check whether they are balanced in terms of coaching schools and countries); - stratification by country/coach; unfortunately, I have not collected these data, and we cannot check it during the community detection part |
| Watts-Strogatz | small-world presence | ≈ 0 | Radius: 3 vs 2 Diametr: 4 vs 3 Average shortest path length: 1.871 vs 1.684 Average Degree: 44.692 vs 34 Clustering coefficient: 0.39 vs 0.582 |
the resulting network may have small world property |
Centrality measures¶
I calculated various centrality measures to determine whether any correlation exists between centrality measures and point distributions. The following measures were calculated:
- degree centrality;
- closeness centrality;
- betweenness centrality;
- eigenvector centrality.
ISU-score correspondance to structural properties¶
def calculate_correlation_coeff(G, centrality_measure):
centrality_functions = {'in_degree_centrality': nx.in_degree_centrality,
'out_degree_centrality': nx.out_degree_centrality,
'closeness_centrality': nx.closeness_centrality,
'betweenness_centrality': nx.betweenness_centrality,
'eigenvector_centrality': nx.eigenvector_centrality,
'pagerank': nx.pagerank}
c_f = centrality_functions[centrality_measure]
measures = c_f(G)
centrality_values = list()
isu_scores = list()
for node, data in G.nodes(data=True):
centrality_values.append(measures[node])
isu_scores.append(data['score'])
pearson, p_value = stats.pearsonr(centrality_values, isu_scores)
return (measures, pearson, p_value)
# I could not calculate the degree centrality as my graph had multigraphs, that's why
# I calculated these values separately
in_degree_centrality_values, pearson_corr, p_value = calculate_correlation_coeff(G, 'in_degree_centrality')
print("Results for in_degree_centrality calculations")
print(f"Pearson correlation: {pearson_corr:.3f}, P-value: {p_value:.3f}")
print("="*43)
out_degree_centrality_values, pearson_corr, p_value = calculate_correlation_coeff(G, 'out_degree_centrality')
print("Results for out_degree_centrality calculations")
print(f"Pearson correlation: {pearson_corr:.3f}, P-value: {p_value:.3f}")
print("="*43)
closeness_centrality_values, pearson_corr, p_value = calculate_correlation_coeff(G, 'closeness_centrality')
print("Results for closeness_centrality calculations")
print(f"Pearson correlation: {pearson_corr:.3f}, P-value: {p_value:.3f}")
print("="*43)
betweenness_centrality_values, pearson_corr, p_value = calculate_correlation_coeff(G, 'betweenness_centrality')
print("Results for betweenness_centrality calculations")
print(f"Pearson correlation: {pearson_corr:.3f}, P-value: {p_value:.3f}")
print("="*43)
eigenvector_centrality_values, pearson_corr, p_value = calculate_correlation_coeff(G, 'eigenvector_centrality')
print("Results for eigenvector_centrality calculations")
print(f"Pearson correlation: {pearson_corr:.3f}, P-value: {p_value:.3f}")
print("="*43)
Results for in_degree_centrality calculations Pearson correlation: -0.044, P-value: 0.654 =========================================== Results for out_degree_centrality calculations Pearson correlation: -0.163, P-value: 0.099 =========================================== Results for closeness_centrality calculations Pearson correlation: -0.022, P-value: 0.826 =========================================== Results for betweenness_centrality calculations Pearson correlation: -0.104, P-value: 0.294 =========================================== Results for eigenvector_centrality calculations Pearson correlation: -0.107, P-value: 0.281 ===========================================
The analysis reveals no significant relationships between ISU points scores and network centrality measures. All correlations are weak and statistically insignificant (p > 0.05), suggesting that an athlete's network position does not meaningfully predict their competition scores. Network centrality appears unrelated to ISU performance in this dataset.
Now let's do some manual work and extract the top-skaters based on these four centraluty degrees values.
def get_top_n_by_centrality(centrality_measures, n=5, centrality_name='in-degree centrality', print_things=True, descending=True):
sorted_nodes = sorted(centrality_measures.items(),
key=lambda x: x[1],
reverse=descending)
top_nodes = sorted_nodes[:n]
if print_things:
for node, value in top_nodes:
print(f"{G.nodes(data=True)[node]['name']}'s {centrality_name} value is {value:.3f}")
return top_nodes
Top-N: descending order¶
top_skaters_by_indegree = get_top_n_by_centrality(in_degree_centrality_values)
Elizaveta TUKTAMYSHEVA's in-degree centrality value is 0.670 Nathan CHEN's in-degree centrality value is 0.670 Ilia MALININ's in-degree centrality value is 0.641 Evgenia MEDVEDEVA's in-degree centrality value is 0.641 Anna SHCHERBAKOVA's in-degree centrality value is 0.563
top_skaters_by_outdegree = get_top_n_by_centrality(out_degree_centrality_values, centrality_name='out-degree centrality')
Anna SHCHERBAKOVA's out-degree centrality value is 0.485 Isabeau LEVITO's out-degree centrality value is 0.456 Elizaveta TUKTAMYSHEVA's out-degree centrality value is 0.417 Alexandra TRUSOVA's out-degree centrality value is 0.417 Diana DAVIS's out-degree centrality value is 0.417
top_skaters_by_closeness = get_top_n_by_centrality(closeness_centrality_values, centrality_name='closeness centrality')
Elizaveta TUKTAMYSHEVA's closeness centrality value is 0.734 Nathan CHEN's closeness centrality value is 0.728 Ilia MALININ's closeness centrality value is 0.717 Evgenia MEDVEDEVA's closeness centrality value is 0.717 Anna SHCHERBAKOVA's closeness centrality value is 0.676
top_skaters_by_betweenness = get_top_n_by_centrality(betweenness_centrality_values, centrality_name='betweenness centrality')
Elizaveta TUKTAMYSHEVA's betweenness centrality value is 0.055 Isabeau LEVITO's betweenness centrality value is 0.051 Anna SHCHERBAKOVA's betweenness centrality value is 0.042 Ilia MALININ's betweenness centrality value is 0.032 Evgenia MEDVEDEVA's betweenness centrality value is 0.028
top_skaters_by_eigenvectors = get_top_n_by_centrality(eigenvector_centrality_values, centrality_name='eigenvectors centrality')
Elizaveta TUKTAMYSHEVA's eigenvectors centrality value is 0.237 Nathan CHEN's eigenvectors centrality value is 0.235 Evgenia MEDVEDEVA's eigenvectors centrality value is 0.230 Anna SHCHERBAKOVA's eigenvectors centrality value is 0.225 Ilia MALININ's eigenvectors centrality value is 0.215
Top-N: ascending order¶
antitop_skaters_by_indegree = get_top_n_by_centrality(in_degree_centrality_values, descending=False)
Kaetlyn OSMOND's in-degree centrality value is 0.000 Deanna STELLATO-DUDEK's in-degree centrality value is 0.010 Cong HAN's in-degree centrality value is 0.019 Niccolo MACII's in-degree centrality value is 0.019 Bruno MASSOT's in-degree centrality value is 0.029
antitop_skaters_by_outdegree = get_top_n_by_centrality(out_degree_centrality_values, centrality_name='out-degree centrality', descending=False)
Yuzuru HANYU's out-degree centrality value is 0.000 Petr GUMENNIK's out-degree centrality value is 0.000 Yuna KIM's out-degree centrality value is 0.000 Shoma UNO's out-degree centrality value is 0.000 Kiira KORPI's out-degree centrality value is 0.000
antitop_skaters_by_closeness = get_top_n_by_centrality(closeness_centrality_values, centrality_name='closeness centrality', descending=False)
Kaetlyn OSMOND's closeness centrality value is 0.000 Cong HAN's closeness centrality value is 0.331 Deanna STELLATO-DUDEK's closeness centrality value is 0.366 Bruno MASSOT's closeness centrality value is 0.373 Dario CIRISANO's closeness centrality value is 0.382
antitop_skaters_by_betweenness = get_top_n_by_centrality(betweenness_centrality_values, centrality_name='betweenness centrality', descending=False)
Yuzuru HANYU's betweenness centrality value is 0.000 Petr GUMENNIK's betweenness centrality value is 0.000 Yuna KIM's betweenness centrality value is 0.000 Kaetlyn OSMOND's betweenness centrality value is 0.000 Shoma UNO's betweenness centrality value is 0.000
antitop_skaters_by_eigenvectors = get_top_n_by_centrality(eigenvector_centrality_values, centrality_name='eigenvectors centrality', descending=False)
Kaetlyn OSMOND's eigenvectors centrality value is 0.000 Cong HAN's eigenvectors centrality value is 0.002 Deanna STELLATO-DUDEK's eigenvectors centrality value is 0.005 Bruno MASSOT's eigenvectors centrality value is 0.007 Niccolo MACII's eigenvectors centrality value is 0.008
Analysis¶
First of all, all top-centrality skaters competed and achieved significant results during the last three Olympic cycles (2015–2025). The oldest is Elizaveta Tuktamysheva, born in 1996; the youngest is Isabeau Levito, born in 2007. On the other hand, the majority of antitop skaters have already ended their careers. This suggests that athlete age may be an important factor influencing their centrality measures, which makes sense since younger people are more prone to Instagram usage.
Furthermore, all top-centrality skaters have participated in the World Championships; all but Diana Davis have won at least one medal, and most have Olympic experience (exceptions: Elizaveta Tuktamysheva; Ilya Malinin and Isabeau Levito have already qualified for the 2026 Olympic Games in Milano). This explains why they are hubs in many ways. However, some antitop skaters have impressive achievements too: Yuzuru Hanyu and Yuna Kim are among the most privileged athletes of all time. Interestingly, they rank low in betweenness centrality because they have very few (Kim) or no (Hanyu) subscriptions to other skaters. Paradoxically, this can be explained by their extreme popularity: they use their Instagram accounts not for direct communication with other skaters but for advertising, as they are highly sought after in that sphere (e.g., Yuna Kim is the face of Nike and Samsung).
It can be noticed that Elizaveta Tuktamysheva ranks top-1 in all centrality measures except out-degree centrality, where Anna Shcherbakova is number one. This can be explained by multiple factors:
- Elizaveta Tuktamysheva began her senior career in 2012 and has competed (and therefore become acquainted) with many older skaters whom younger athletes like Shcherbakova, Trusova, or Levito have not encountered;
- unlike Evgenia Medvedeva, who ended her career in 2019, Elizaveta Tuktamysheva continued competing until autumn 2023 (and officially retired only in December 2025). Thus, she became acquainted with the younger generation and competed against them. Both this factor and the previous one make it clear that Elizaveta Tuktamysheva is a major hub. This is also indirectly proven by the fact that she has the longest career among all athletes except Alyona Savchenko.
Let's vizualize the top-skaters and their edges based on different centrality measures.
def get_viz_params(G, centrality_measure_for_all_skaters):
inG = G.to_undirected()
cent_dict = dict(centrality_measure_for_all_skaters)
disciplines = [inG.nodes[node]['discipline'] for node in inG.nodes()]
unique_disciplines = list(set(disciplines))
discipline_mapping = {disc: i for i, disc in enumerate(unique_disciplines)}
colormap = plt.colormaps.get_cmap('Pastel1')
node_colors = [colormap(discipline_mapping[disc]) for disc in disciplines]
scores = [inG.nodes[node]['score'] for node in inG.nodes()]
min_size, max_size = 5, 1000
score_norm = Normalize(vmin=min(scores), vmax=max(scores))
node_sizes = [min_size + (max_size - min_size) * score_norm(score) for score in scores]
highlighted_nodes = []
highlighted_labels = {}
for node in inG.nodes():
if node in cent_dict:
highlighted_nodes.append(node)
name = inG.nodes[node].get('name', node)
highlighted_labels[node] = name
edge_alpha = list()
edge_width = list()
edge_color = list()
for edge in inG.edges():
u, v = edge
if u in highlighted_nodes and v in highlighted_nodes:
edge_alpha.append(0.5)
edge_width.append(1.5)
edge_color.append("#ffdee8")
inG[u][v]['kawaii_weight'] = 20.
elif u in highlighted_nodes or v in highlighted_nodes:
edge_alpha.append(0.5)
edge_width.append(1.5)
edge_color.append("#ffdee8")
inG[u][v]['kawaii_weight'] = 2.
else:
edge_alpha.append(0.5)
edge_width.append(0.5)
edge_color.append("#caf1ff")
inG[u][v]['kawaii_weight'] = 1
layout = nx.kamada_kawai_layout(inG, weight='kawaii_weight')
all_params = {'nodes_params': {'G': inG,
'pos': layout,
'node_size': node_sizes,
'node_color': node_colors,
'alpha':0.5},
'edges_params': {'G': inG,
'pos': layout,
'alpha': edge_alpha,
'width': edge_width,
'edge_color': edge_color},
'labels_params': {
'G': inG,
'pos': layout,
'labels': highlighted_labels
}
}
return all_params
cl = get_top_n_by_centrality(in_degree_centrality_values, n=5, centrality_name='in-degree centrality', print_things=False)
all_params = get_viz_params(G, cl)
plt.figure(figsize=(10, 7))
nx.draw_networkx_nodes(**all_params['nodes_params'])
nx.draw_networkx_edges(**all_params['edges_params'])
nx.draw_networkx_labels(**all_params['labels_params'])
legend_patches = [
mpatches.Patch(color='#defcce', label="Ice Dance"),
mpatches.Patch(color='#ffe6e6', label="Men's Single"),
mpatches.Patch(color='#e9dbf9', label="Women's Single"),
mpatches.Patch(color='#ceeef9', label="Pairs")
]
plt.legend(handles=legend_patches, loc='upper left', frameon=True,
fancybox=True, framealpha=0.9, shadow=True)
plt.savefig("visualisations/in_degree_graph.png", dpi=400)
plt.show()
cl = get_top_n_by_centrality(out_degree_centrality_values, n=5, centrality_name='out-degree centrality', print_things=False)
all_params = get_viz_params(G, cl)
plt.figure(figsize=(15, 7))
nx.draw_networkx_nodes(**all_params['nodes_params'])
nx.draw_networkx_edges(**all_params['edges_params'])
nx.draw_networkx_labels(**all_params['labels_params'])
legend_patches = [
mpatches.Patch(color='#defcce', label="Ice Dance"),
mpatches.Patch(color='#ffe6e6', label="Men's Single"),
mpatches.Patch(color='#e9dbf9', label="Women's Single"),
mpatches.Patch(color='#ceeef9', label="Pairs")
]
plt.legend(handles=legend_patches, loc='upper left', frameon=True,
fancybox=True, framealpha=0.9, shadow=True)
plt.savefig("visualisations/out_degree_graph.png", dpi=400)
plt.show()
cl = get_top_n_by_centrality(closeness_centrality_values, n=5, centrality_name='closeness centrality', print_things=False)
all_params = get_viz_params(G, cl)
plt.figure(figsize=(10, 7))
nx.draw_networkx_nodes(**all_params['nodes_params'])
nx.draw_networkx_edges(**all_params['edges_params'])
nx.draw_networkx_labels(**all_params['labels_params'])
legend_patches = [
mpatches.Patch(color='#defcce', label="Ice Dance"),
mpatches.Patch(color='#ffe6e6', label="Men's Single"),
mpatches.Patch(color='#e9dbf9', label="Women's Single"),
mpatches.Patch(color='#ceeef9', label="Pairs")
]
plt.legend(handles=legend_patches, loc='upper left', frameon=True,
fancybox=True, framealpha=0.9, shadow=True)
plt.savefig("visualisations/closeness_graph.png", dpi=400)
plt.show()
cl = get_top_n_by_centrality(betweenness_centrality_values, n=5, centrality_name='betweenness centrality', print_things=False)
all_params = get_viz_params(G, cl)
plt.figure(figsize=(15, 7))
nx.draw_networkx_nodes(**all_params['nodes_params'])
nx.draw_networkx_edges(**all_params['edges_params'])
nx.draw_networkx_labels(**all_params['labels_params'])
legend_patches = [
mpatches.Patch(color='#defcce', label="Ice Dance"),
mpatches.Patch(color='#ffe6e6', label="Men's Single"),
mpatches.Patch(color='#e9dbf9', label="Women's Single"),
mpatches.Patch(color='#ceeef9', label="Pairs")
]
plt.legend(handles=legend_patches, loc='upper left', frameon=True,
fancybox=True, framealpha=0.9, shadow=True)
plt.savefig("visualisations/betwenness_graph.png", dpi=400)
plt.show()
cl = get_top_n_by_centrality(eigenvector_centrality_values, n=5, centrality_name='eigenvector centrality', print_things=False)
all_params = get_viz_params(G, cl)
plt.figure(figsize=(15, 7))
nx.draw_networkx_nodes(**all_params['nodes_params'])
nx.draw_networkx_edges(**all_params['edges_params'])
nx.draw_networkx_labels(**all_params['labels_params'])
legend_patches = [
mpatches.Patch(color='#defcce', label="Ice Dance"),
mpatches.Patch(color='#ffe6e6', label="Men's Single"),
mpatches.Patch(color='#e9dbf9', label="Women's Single"),
mpatches.Patch(color='#ceeef9', label="Pairs")
]
plt.legend(handles=legend_patches, loc='upper left', frameon=True,
fancybox=True, framealpha=0.9, shadow=True)
plt.savefig("visualisations/eigenvector_graph.png", dpi=400)
plt.show()
PageRank¶
As we have a directed graph, it would be interesting to analyze PageRank values for each node. In our skater network, this translates to identifying skaters who are "central" not just by being followed, but by being followed by other influential skaters.
pagerank_scores = nx.pagerank(G)
top_nodes = sorted(pagerank_scores.items(), key=lambda x: x[1], reverse=True)
for node, score in top_nodes[:10]:
print(f"{G.nodes()[node]['name']}: {score:.3f}")
Evgenia MEDVEDEVA: 0.028 Mark KONDRATIUK: 0.028 Elizaveta TUKTAMYSHEVA: 0.026 Nathan CHEN: 0.026 Anna SHCHERBAKOVA: 0.025 Ilia MALININ: 0.024 Alexandra TRUSOVA: 0.020 Tessa VIRTUE: 0.018 Maxim TRANKOV: 0.017 Victoria SINITSINA: 0.017
# let's also analyze whether there is some correlation between ISU rating scores and pagerank scores
in_degree_centrality_values, pearson_corr, p_value = calculate_correlation_coeff(G, 'pagerank')
print("Results for PageRank calculations")
print(f"Pearson correlation: {pearson_corr:.3f}, P-value: {p_value:.3f}")
print("="*43)
Results for PageRank calculations Pearson correlation: -0.061, P-value: 0.538 ===========================================
The emergence of Evgenia Medvedeva and Mark Kondratiuk in the top tier of the PageRank rankings, despite them not being the most-mentioned skaters by raw in-degree (Mark wasn't mentioned at all), highlights the algorithm's ability to identify influence through network structure rather than just popularity.
Medvedeva's high score can be attributed to her position as a hub receiving attention from other influential nodes. Her acclaimed international show career and her training under the legendary coach Brian Orser likely connect her to a network of elite skaters, coaches, and commentators. Follows from these highly connected individuals carry more weight in the PageRank calculation, amplifying her standing.
Similarly, Mark Kondratiuk's presence is driven by his role in high-profile team events, such as the Olympic team competition. This positioned him directly alongside and likely within the mention networks of top skaters like Kamila Valieva and Victoria Sinitsina. Being referenced by these central figures provides a significant boost in PageRank, as their endorsements are algorithmically valuable.
Community detection¶
I selected Louvain for its efficiency with dense social graphs and its modularity optimization suited to overlapping social circles. Girvan-Newman provides a hierarchical alternative. Both algorithms address directed edges and homogeneous node composition while revealing different structural perspectives.
Louvain algorithm¶
lov_communities = nx_comm.louvain_communities(G, weight=None, seed=seed)
print(f"{nx_comm.modularity(G, lov_communities):.3f}")
0.185
Girvan-Newman algorithm¶
gn_communities = nx.community.girvan_newman(G)
for i in range(4):
partition = next(gn_communities)
print(f"{nx_comm.modularity(G, partition):.3f}")
-0.000
Analysis¶
The Louvain algorithm's modularity score of 0.185 indicates weak community structure in this skater network. Girvan-Newman's near-zero score reflects its different approach, as it removes high-betweenness edges, which often fails to capture cohesive communities.